WIP+ENH: Add REMAP action for automatically remapping (instance) UIDs #203

moloney · 2022-03-17T23:16:41Z

Description

This PR improves (instance) UID handling by allowing them to be automatically remapped through a new "REMAP" action. The new UIDs are created in a way that strips all PHI (usually timestamps embedded into the UID) while preserving the study/series hierarchy (defined by the UIDs) in the anonymized output. This allows the anonymized data to be(for example) uploaded to a PACS or other network entity.

Checklist

I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
My code follows the style guidelines of this project

Open questions

I also updated the default recipe to handle UIDs in this way, if you prefer I don't do that (at least for now) let me know.

Maintains study / series relationships since the same input UID always produces the same output UID, while removing any potential PHI (output is generated with cryptograhically secure hash function).

vsoch · 2022-03-17T23:20:17Z

Hey thanks for the PR!

So this seems very hard coded for your specific use case (which is okay) but I want to challenge you to think of this more generally. E.g., what you really are doing is running a custom function given a specific VR value. E.g.,:

field.element.VR == 'UI

So why is this not just a replace with a custom function? The function is handed the field so you'd just do that check within the function and act appropriately. Is there a reason this approach will not work?

vsoch · 2022-03-17T23:23:53Z

E.g.:

REPLACE contains:((?<!SOPClass)UID) func:remap_uid

def remap_uid(item, value, field):
    """Remap existing UID in stable and secure manner
    
    Same input UID creates the same output UID, which keeps study / series
    associations correct (or even FrameOfReferenceUID) provided the full
    study / series in anonymized. At same time input UID can't be recovered
    from output so any potential PHI (i.e. dates / times) is eliminated.
    """
    if field.element.VR == 'UI':
        new_val = remap_uid(field.element)

        if isinstance(field.value, list):
            # Handle VM > 1 
            return [generate_uid(entropy_srcs=[x]) for x in field.value]
        else:
            return generate_uid(entropy_srcs=[field.value])

    # default to return the new value
    return new_val

moloney · 2022-03-17T23:38:52Z

I agree this could just be a separate function rather than a dedicated "action". The line is a little blurry, JITTER could just be a function too, right? Does the 'deid' package provide a library of useful functions like this that could then be reference in the default recipe? I think this remapping behavior for UIDs is a much better default than stripping them as the latter produces essentially "broken" DICOM files.

vsoch · 2022-03-17T23:50:27Z

The difference is that JITTER is an explicit but general action, and one that is generally requested across this process so the dates aren't the same. This REMAP isn't really intuitive, arguably it could describe something else, and it's on the same par as a custom uid function. So I think in my mind, the two are fairly different.

Does the 'deid' package provide a library of useful functions like this that could then be reference in the default recipe? I think this remapping behavior for UIDs is a much better default than stripping them as the latter produces essentially "broken" DICOM files.

But I love this idea! Indeed it would be really useful to provide custom functions, and make it easy to contribute and then use! If you want to talk about a design to support that (and then make it easier for the user to use as opposed to needing to roll their own as you have) I'd definitely be open to this new addition. Perhaps the current deid/dicom/actions.py could be renamed to a module folder, e.g.,:

deid/dicom/actions/
    __init__.py
   jitter.py             <- the current jitter timestamp function

and JITTER would be grandfathered into being it's own term (for the reasons I specified) but technically it's still an action and could go alongside the actions (I'll leave the organization up to you - I optimize to make it easier for the developer to find things). But then perhaps we could have a uids set of actions:

deid/dicom/actions/
    __init__.py
   jitter.py             <- the current jitter timestamp function
   uids.py

Into which we can put the documentation example, along with your function here! And then for usage:

REPLACE contains:((?<!SOPClass)UID) deid_func:remap_uid

and all the custom functions would be imported into init.py and thus accessible via deid.dicom.actions.<name> so you'd parse that it's a deid_func, get the string, and then use importlib to get it. Then we would have a nice table of custom functions defined, and instructions for adding a custom function (basically writing a new module in that file).

Let me know your thoughts! That's just a quick idea for a design.

moloney · 2022-03-18T00:13:08Z

I agree, that is a better approach and will make it much easier to add more custom functions in the future. I will close this and open a different PR.

vsoch · 2022-03-18T00:26:14Z

Awesome! Looking forward to seeing it - please ping me if you want to have any discussion, etc.

wetzelj · 2022-03-18T12:48:12Z

This is a wonderful idea!

vsoch · 2022-04-14T23:23:36Z

@moloney and @wetzelj please see #208 !

moloney added 2 commits March 17, 2022 15:36

ENH: Add REMAP action for automatically remapping (instance) UIDs

6caaa1a

Maintains study / series relationships since the same input UID always produces the same output UID, while removing any potential PHI (output is generated with cryptograhically secure hash function).

TST: Fix existing test, add one for UID remapping

274b54e

moloney closed this Mar 18, 2022

This was referenced Apr 11, 2022

feat: copy the RSNA anonymizer protocol #206

Open

Add deid provided functions #207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP+ENH: Add REMAP action for automatically remapping (instance) UIDs #203

WIP+ENH: Add REMAP action for automatically remapping (instance) UIDs #203

moloney commented Mar 17, 2022

vsoch commented Mar 17, 2022

vsoch commented Mar 17, 2022 •

edited

Loading

moloney commented Mar 17, 2022

vsoch commented Mar 17, 2022

moloney commented Mar 18, 2022

vsoch commented Mar 18, 2022

wetzelj commented Mar 18, 2022

vsoch commented Apr 14, 2022

WIP+ENH: Add REMAP action for automatically remapping (instance) UIDs #203

WIP+ENH: Add REMAP action for automatically remapping (instance) UIDs #203

Conversation

moloney commented Mar 17, 2022

Description

Checklist

Open questions

vsoch commented Mar 17, 2022

vsoch commented Mar 17, 2022 • edited Loading

moloney commented Mar 17, 2022

vsoch commented Mar 17, 2022

moloney commented Mar 18, 2022

vsoch commented Mar 18, 2022

wetzelj commented Mar 18, 2022

vsoch commented Apr 14, 2022

vsoch commented Mar 17, 2022 •

edited

Loading